# RSA HARDWARE ACCELERATOR



| Group   | Group 15              |
|---------|-----------------------|
| Authors | Sondre Pettersen      |
|         | Sturla Østeby Sletner |
|         | Nikolai Markussen     |
| Date    | 29.09.2025            |

# INSTRUCTIONS

Fill out all parts of this document that are marked in green.

#### INTRODUCTION

This document contains the requirements, design specification and test plan for an RSA encryption circuit. The document also specifies key milestones, deliverables and the criteria used for evaluating the work of the group.

This document is written in such a way that it facilitates quick and efficient evaluation of the work done by each group and is not a template for how to write a typical project thesis or master thesis report.

#### **CODE OF HONOR**

We hereby declare that this design has been developed by us. This means that the high-level model, the microarchitecture, the RTL code and the testbench code has all been developed by the team.

Papers we have read that e.g. describes different ways of doing modular exponentiation are listed in the reference section.

We understand that attempts of plagiarism can result in the grade "F".



Signature of all team members

#### **DESIGN REQUIREMENS**

The design requirements are shown in Table 1. The requirements have been divided into functional (FUNC) requirements, requirements for performance, power and area (PPA), interface requirements (INT) and configuration requirements (CONF)

Priority is given for each requirement. The rightmost column contains a checkbox. Write **OK** in that if your design has met the corresponding requirement.

Table 1. RSA Hardware accelerator design requirements

| Requirement ID | Priority | Description                                                                                                                                                                                            | Check |  |
|----------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|--|
| REQ_FUNC_01    | MUST     | The design must implement a function that can compute modular exponentiation $X = Y^k \mod n$                                                                                                          |       |  |
| REQ_FUNC_02    | MUST     | The design must be able to encrypt and decrypt message blocks using modular exponentiation:  Encryption: C = Me mod n, M < n, C < n, e < n  Decryption: M = Cd mod n, M < n, C < n, d < n              |       |  |
| REQ_PPA_01     | MUST     | Encrypt/decrypt a message of length 256 bits as fast as possible.                                                                                                                                      |       |  |
| REQ_PPA_02     | MUST     | The design must fit inside the Zynq XC7Z020 FPGA on the Digilent Pynq-Z1 board.                                                                                                                        |       |  |
| REQ_PPA_03     | MUST     | There is no requirement for the clock frequency of the programmable logic. The platform supports any clock frequency.                                                                                  |       |  |
| REQ_PPA_04     | SHOULD   | The hardware accelerator should run testcase 4 faster than 400 ms.                                                                                                                                     |       |  |
| REQ_INT_01     | MUST     | The RSA design must be integrated as a hardware accelerator inside the Zynq SoC. It must be managed by the CPU and made accessible through the Juniper notebook interface.                             |       |  |
| REQ_INT_02     | SHOULD   | The design should implement memory mapped status registers, performance counters and other mechanisms for debugging of features and performance at system level.                                       |       |  |
| REQ_INT_03     | MUST     | The design must have one AXI-Lite Slave interface to enable access of memory-mapped registers.                                                                                                         |       |  |
| REQ_INT_04     | MUST     | The design must have one AXI stream slave interface for input messages that shall be encrypted(decrypted) and one AXI stream master interface for output messages that have been encrypted(decrypted). |       |  |
| REQ_CONF_01    | SHOULD   | The design should be optimized for 256 bit block/message/key size.                                                                                                                                     |       |  |

# DEVELOPMENT, DOCUMENTATION AND CODE REQUIREMENS

This document has a lot of different sections the group must fill out. These sections are all marked in green. In addition to this document, the group shall also submit model code, RTL code for the design and code for the verification environments. These requirements are captured in Table 2

The rightmost column contains a checkbox. Write **OK** in that if your group has met the corresponding requirement.

Table 2. RSA Hardware accelerator documentation and code requirements

| Requirement ID | Priority | Description                                                                                                    | Check |  |
|----------------|----------|----------------------------------------------------------------------------------------------------------------|-------|--|
| REQ_DEV_01     | MUST     | The development is broken down into milestones. The group must deliver the milestones on time.                 |       |  |
| REQ_DOC_01     | MUST     | All green parts of this document must be filled out.                                                           |       |  |
| REQ_DOC_02     | MUST     | This document must contain information about algorithm used for computing modular multiplication.              |       |  |
| REQ_DOC_03     | MUST     | This document must contain description of the design including microarchitecture diagrams.                     |       |  |
| REQ_DOC_04     | MUST     | This document must contain verification plan.                                                                  |       |  |
| REQ_DOC_05     | MUST     | This document must contain results from performance measurements.                                              |       |  |
| REQ_CODE_01    | MUST     | RTL code for the design must be attached the final delivery bundle.                                            |       |  |
| REQ_CODE_02    | MUST     | Code for the testbench(es) developed by the group must be attached the final delivery bundle.                  |       |  |
| REQ_CODE_03    | MUST     | High level model code (Python, Matlab, C++) developed by the group must be attached the final delivery bundle. |       |  |

## **MILESTONES**

A considerable amount of work and effort is needed in order to develop an RSA encryption circuit. The development is therefore split up into a set of milestones as listed in Table 3

The rightmost column contains a checkbox. Write **OK** in that if your group has met the corresponding milestone.

Table 3. Term project schedule and milestones

| Milestone                                                    | Date   | <b>Delivery instructions</b>                                                                     | Description                                                                                                 | Check                                 |  |   |
|--------------------------------------------------------------|--------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|---------------------------------------|--|---|
| Form groups AUG 26                                           |        | Sign up on<br>Blackboard                                                                         | Form term project groups                                                                                    | Х                                     |  |   |
| Study algorithms and pick one                                | SEP 10 | Nothing to upload                                                                                | Study algorithms and pick one                                                                               | Х                                     |  |   |
| High level model                                             | SEP 17 | 7 Upload code on Implement the algorithm in python<br>Blackboard or another high level language. |                                                                                                             | '   '   '   '   '   '   '     '     ' |  | Х |
| Microarchitecture                                            | OCT 3  | Upload diagram on<br>Blackboard                                                                  | Draw microarchitecture diagram for hardware design in this datasheet.                                       | Х                                     |  |   |
| estimate                                                     |        | Estimate performance. Upload to Blackboard.                                                      | Estimate the time needed to encrypt/decrypt a block, in this datasheet.                                     | Х                                     |  |   |
| Microarchitecture review/presentation                        | OCT 3  | Give presentation in class.                                                                      | Staff and fellow students (peers) reviews the solutions proposed by each team and gives feedback.           | Х                                     |  |   |
| RTL Code (Alpha)                                             | OCT 29 | Upload RTL code to<br>Blackboard.                                                                | Write synthesizable register transfer level code for the RSA design. Include testbenches in the submission. |                                       |  |   |
| Working on FPGA (Alpha)                                      | NOV 12 | Upload PPA on<br>Blackboard.                                                                     | Design working on FPGA.                                                                                     |                                       |  |   |
| Hand in this<br>document and all<br>pieces of source<br>code | NOV 21 | Upload this document together with all pieces of source code on Blackboard.                      | Hand in this document                                                                                       |                                       |  |   |

# DESIGN AND VERIFICATION PROCESS

When designing a hardware design, it is important to follow the following steps:

#### 1) Capture, understand and analyze all requirements.

#### 2) Design exploration:

- Create a high level model that allow you to quickly and easily compute functionally correct output for a given set of inputs.
- Come up with a way to efficiently search through the design space in order to find the design that satisfy the requirements.
- Evaluate and improve the PPA of different alternative solutions.

#### 3) Write design specification:

- Describe the design you intend to make
- Draw microarchitecture diagrams
- Clearly define interfaces between modules in the design

#### 4) Design and verification:

- Write RTL code according to the design specification
- Verify that the design is working using testbenches and other verification environments

## 5) Implement the design:

- Synthesize the design
- Run Place & Route

# 6) Test on FPGA

- Run performance benchmarks on FPGA prototype platform

During the work with the design, verification and implementation of the RSA encryption circuit, you will go through all these phases.

# HIGH LEVEL MODEL CODE (9 POINTS)

<Create a high level model of the algorithm(s) you used for modular multiplication and modular exponentiation.>

```
# High-level RSA algorithm
# Need to implement:
# Encryption: C = M^e mod(n)
# Decryption: M = C^d mod(n)
 Where C is the encrypted message, M is the message,
public/private exponents
import math
from sympy import randprime
import secrets
M = secrets.randbits(256)
C = 0
def rsa key generation():
   p = randprime(2**127, 2**128)
   q = randprime(2**127, 2**128)
   n = p*q
   phi_n = (p-1)*(q-1)
   e = 65537
   if math.gcd(e, phi_n) != 1:
       print("Check")
   d = pow(e, -1, phi_n)
   print("Phi(n):", phi_n)
   print("Public key:", e, n)
   print("Private key:", d, n)
   return e, d, n
# This part will be implemented in VHDL
def blakley_mul(a, b, n):
   r = 0
   for i in range(b.bit_length()-1, -1, -1):
       r = (r << 1) % n
       if (b >> i) & 1:
           r = (r + a) \% n
   return r
def modexp_RL_method(base, exponent, n):
   result = 1
   base = base % n
```

```
e = exponent
   while e > 0:
       if e & 1:
           result = blakley_mul(result, base, n)
       base = blakley_mul(base, base, n)
       e = e \gg 1
   return result
e, d, n = rsa_key_generation()
C = modexp_RL_method(M, e, n)
print("blakley_counter:", blakley_counter)
print("Original message:", M)
print("Encrypted message:", C)
M_decrypted = modexp_RL_method(C, d, n)
print("Decrypted message:", M_decrypted)
if ((M % n) == M_decrypted):
   print("Success!")
else:
   print("Not success :(")
```

Figure 1. High level model of modular multiplication and modular exponentiation.

#### <Describe your high level model>

This high level model uses the RL-binary shift method with two blakley multipliers inside it. The plan is to run these two in parallel in the microarchitecture but after feedback from the presentation we will consider reusing the same blakley module in the microarchitecture.

## SYSTEM OVERVIEW

The RSA encryption platform consists of a hardware design and a software driver stack that enables the user to interact with the hardware.

The hardware is implemented on a PYNQ-Z1 [1,2] development board. This board is equipped with a Xilinx ZYNQ-7020[3] system on chip. The ZYNQ contains a processing subsystem with two Arm CPUs and a programmable logic part. Our RSA accelerator is placed within the programmable logic. It is connected to the processing system through an AXI[4,5] interconnect as show in Figure 2.



Figure 2. Software and hardware components of the RSA encryption platform.

## FLOW CONTROL THROUGH VALID/READY HANDSHAKING

In a digital system, such as the one we are going to construct, data is transferred from block to block. It is important that data is transferred in such a way that none of the blocks gets ahead of other blocks and e.g. do not send data before the receiver is ready to accept new incoming data. It is necessary for some sort of flow control.

One very common flow control protocol is valid/ready handshaking. The protocol is illustrated in Figure 3 and Figure 4 (see also [6], page 480).



Figure 3. Sender and Receiver exchanging data.



Figure 4. Valid - Ready handshaking. Timing diagram.

When a sender wants to send data to a receiver. It will signal that **data** is present and valid by asserting the **valid** signal. When the receiver can receive data, the receiver signals this by setting the **ready** signal high. The **data** will be successfully transferred from the sender to the receiver on the first positive edge of the clock where both the **valid** signal and the **ready** signal is high at the same time.

At the transfer of **A** in Figure 4 above, the sender had to wait for the **ready** signal of the receiver. When **B** and **C** were transferred the receiver was **ready** and waiting for the sender to send data. When both **ready** and **valid** remains high, a new datum is transferred in every cycle (this is the case with **D**).

If the valid signal is high and the ready signal is low, then none of the signals must change value until the ready signal has become high.

All the interfaces between modules within this project (that needs flow control) is based on valid-ready handshaking. It is also the protocol used for transferring data on AXI interfaces.

#### RSA CORE INTERFACE

The **RSA ACCELERATOR** from Figure 2 is shown in more detail in Figure 5. The **rsa\_core** block in the middle is the block that does the modular exponentiation calculations. This is the module that you are going to implement as a part of the term project in TFE4141 Design of digital systems 1. The other blocks (rsa\_regio, rsa\_msgin and rsa\_msgout) are already made.



Figure 5. Main blocks within the RSA ACCELERATOR

The **rsa\_regio** unit contains key registers. These registers can be written and read by a master in the system through the AXI master interface. The keys are sent out of the **rsa\_regio** module to the **rsa\_core** module where they are used during the encryption process. The **rsa\_status** signal comes from the **rsa\_core** and is written to one of the registers. This can be used by the CPU to retrieve information about the status of the rsa\_accelerator. It is up to the group to decide what status information that could be interesting.

Messages that will be encrypted/decrypted are sent in to the **rsa\_core** from the **rsa\_msgin** block in a continuous stream (**msgin\_\***). The results are sent from the **rsa\_core** to the **rsa\_msgout** block through another stream (**msgout\_\***). The diagram in Figure 6 shows how messages are sent in and out of rsa\_core.

The message M<n> on msgin\_data is transferred from the sender (rsa\_msgin) to the receiver (rsa\_core) on the first rising edge of clk when msgin\_valid and msgin\_ready are both high at the same time. The msgin\_last signal indicates whether M<n> is the last message in the stream or not.

The message C<n> on msgout\_data is transferred from the sender (rsa\_core) to the receiver (rsa\_msgout) on the first rising edge of clk when msgout\_valid and msgout\_ready are both high at the same time. The msgout\_last signal indicates whether C<n> was the last message in the stream or not. It must therefore be identical to the value msgin\_last had during the transfer of M<n>.



Figure 6. Message transport in and out of rsa\_core.

# RSA CORE MICROARCHITECTURE (20 POINTS)

<This chapter must contain one or more diagrams that illustrates the microarchitecture of the rsa\_core. Also add a description of the design>



This microarchitecture implements the previously mentioned blakley multipliers inside the modular exponentiation. The message is received it is first modulated and then sent to the blakley modules. The blakley modules are run once for each bit in the message and blakley\_mul\_1 is only ran when the current bit of e is 1.

## PERFORMANCE ESTIMATION (8 POINTS)

<Estimate the number of clock cycles your system needs in order to encrypt/decrypt a message (worst case). Estimate the likely clock frequency of your design as well. >

#### **Assumptions:**

Modulo: 2 cycles

Add/sub: 1 cycles

Shift: 1 cycle

Load into register: 1 cycle

Equation: 2\*(e\_bits(m\_load + 256\*(blakley\_mul\_2)))

• E\_bits = 16 Number of bits in exponent of message

M\_load = 4
 Top part of Microarchitecture, loads messages into register and does modulo

• Blakley\_mul\_2 = 7 Number of times the algorithm has to be run in this implementation

Result: 32 832 cycles per encryption

#### VERIFICATION PLAN AND VERIFICATION SUMMARY (10 POINTS)

<Describe the verification goals and the verification environments you put in place to meet these goals.</p>
Summarize the verification results. >

#### SYNTHESIS AND IMPLEMENTATION RESULTS (20 POINTS)

<Present area/utilization, max frequency, power consumption, for your design after synthesis>

<Present area/utilization, max frequency, power consumption, for your design after implementation>

<Describe to what extent the design is fully working on the FPGA. If not fully working, discuss why not>

<If the design works on the FPGA you will receive at least 15 point>

#### PERFORMANCE BENCHMARKING ON FPGA (15 POINTS)

<Present the performance benchmark results from FPGA runs. Include the performance graph from the juniper notebook and populate the tables>

<The faster the circuit is, the more points you will get. For instance, if you end up in the main part of the Hall of Fame, you get full score>

Table 4. Number of clock cycles spent while running the different testcases.

| Testcase             | T0                                                                                                                                                                                      | T1                                                                                                                                                      | T2                                                                                                                      | T3                                                                                      | T4                                                      | T5                      |
|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------|-------------------------|
| Туре                 | ENCR                                                                                                                                                                                    | ENCR                                                                                                                                                    | ENCR                                                                                                                    | DECR                                                                                    | DECR                                                    | DECR                    |
| Blocks               | 504                                                                                                                                                                                     | 7056                                                                                                                                                    | 144                                                                                                                     | 504                                                                                     | 7056                                                    | 144                     |
| <hw config1=""></hw> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""></clock<></th></clock<> | <clock< th=""></clock<> |
|                      | cycles>                                                                                                                                                                                 | cycles>                                                                                                                                                 | cycles>                                                                                                                 | cycles>                                                                                 | cycles>                                                 | cycles>                 |

| <hw 2="" config=""></hw> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""><th><clock< th=""></clock<></th></clock<></th></clock<> | <clock< th=""><th><clock< th=""></clock<></th></clock<> | <clock< th=""></clock<> |
|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|---------------------------------------------------------|-------------------------|
|                          | cycles>                                                                                                                                                                                 | cycles>                                                                                                                                                 | cycles>                                                                                                                 | cycles>                                                                                 | cycles>                                                 | cycles>                 |

Table 5. Runtime (in ms) for the different testcases.

| Configuration            | Frequency     | T0                                                                                                                                                                                                              | T1                                                                                                                                                                          | T2                                                                                                                                      | T3                                                                                                  | T4                                                              | T5                          |
|--------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|-----------------------------------------------------------------|-----------------------------|
| SW                       | -             | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""></runtime<></th></runtime<> | <runtime< th=""></runtime<> |
|                          |               | in ms>                                                                                                                                                                                                          | in ms>                                                                                                                                                                      | in ms>                                                                                                                                  | in ms>                                                                                              | in ms>                                                          | in ms>                      |
| <hw 1="" config=""></hw> | <freq></freq> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""></runtime<></th></runtime<> | <runtime< th=""></runtime<> |
|                          |               | in ms>                                                                                                                                                                                                          | in ms>                                                                                                                                                                      | in ms>                                                                                                                                  | in ms>                                                                                              | in ms>                                                          | in ms>                      |
| <hw 1="" config=""></hw> | <freq></freq> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""><th><runtime< th=""></runtime<></th></runtime<></th></runtime<> | <runtime< th=""><th><runtime< th=""></runtime<></th></runtime<> | <runtime< th=""></runtime<> |
|                          |               | in ms>                                                                                                                                                                                                          | in ms>                                                                                                                                                                      | in ms>                                                                                                                                  | in ms>                                                                                              | in ms>                                                          | in ms>                      |

# SOURCE CODE QUALITY (9 POINTS)

- <a href="Attach the model code">Attach the model code</a>, RTL code and testbench code as a part of the delivery bundle>
- <Describe how the files in the zip file are organized (e.g. folder structure)>
- <Define the RTL coding rules you have tried to follow while writing the RTL code>

# DISCUSSION ON SUSTAINABILITY (9 POINTS)

<Discuss how cryptography in general and your RSA implementation in particular have impact on sustainability as defined in the UN goals>

# **EVALUATION CRITERIA**

The evaluation of your term project will be based on this datasheet in addition to the attachments.

| Model algorithm                            | 9 points   |
|--------------------------------------------|------------|
| Microarchitecture                          | 20 points  |
| Performance estimation                     | 8 points   |
| Verification plan and verification summary | 10 points  |
| Synthesis and implementation results       | 20 points  |
| Performance benchmarking on FPGA           | 15 points  |
| Source code quality                        | 9 points   |
| Discussion on sustainability               | 9 points   |
| TOTAL                                      | 100 POINTS |

# REFERENCES

[1] PYNQ-Z1 board by Digilent,

https://store.digilentinc.com/pynq-z1-python-productivity-for-zynq-7000-arm-fpga-soc/

[2] List of other compatible PYNQ boards,

http://www.pynq.io/board.html

[3] Xilinx ZYNQ-7000 SoC

https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html

[4] AMBA Specification

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0022b/index.html

[5] Vivado Design Suite, AXI Reference guide

https://www.xilinx.com/support/documentation/ip documentation/axi ref guide/latest/ug1037-vivado-axi-reference-guide.pdf

[6] Dally, W. J., Curtis Harting, R. and Aamodt, T. M., *Digital design using VHDL: a systems approach*. (Cambridge: Cambridge University Press, 2016)